This is page 1 
Printer: Opaque this 



5h 



O 

On 



Quantifying Rapid Variability in 
Accreting Compact Objects 



t*"- ■ M. van der Klis 

On " 
On 



ABSTRACT I discuss some practical aspects of the analysis of millisecond 
time variability X-ray data obtained from accreting neutron stars and black 
holes. First I give an account of the statistical methods that are at present 
, commonly applied in this field. These are mostly based on Fourier tech- 

00 ■ niques. To a large extent these methods work well: they give astronomers 

£SJ ' the answers they need. Then I discuss a number of statistical questions that 

astronomers don't really know how to solve properly and that statisticians 
may have ideas about. These questions have to do with the highest and 
the lowest frequency ranges accessible in the Fourier analysis: how do you 
. determine the shortest time scale present in the variability, how do you 

measure steep low-frequency noise. The point is stressed that in order for 
any method that resolves these issues to become popular, it is necessary 
to retain the capabilities the current methods already have in quantify- 
ing the complex, concurrent variability processes characteristic of accreting 
neutron stars and black holes. 



O ■ 1 Introduction 

The purpose of this talk is to explain to statisticians how astrophysicists, 
mostly using Fourier transform techniques, go about analyzing X-ray time- 
series data obtained from accreting compact objects (neutron stars and 
Jv>( , black holes), and to point out a few problems with the usual approaches. 

5_j ■ The point will be made, that the conglomerate of statistical methods that 

is being applied in this branch of high-energy astrophysics, even though 
most definitely not always rigorous, on the whole serves its purpose well 
and is providing astronomers with the quantitative answers they require. 
This talk will aim, however, at a few areas where we run into problems, and 
where more statistical expertise might help. In thinking about how to solve 
the problems that I shall outline, it will be important to keep an eye on 
what the capabilities of the existing methods are, as those capabilities will 
need to be preserved in whatever new approach one would like to propose. 

Mostly, accreting neutron stars and black holes occur in double star 
systems known as X-ray binary stars, where a normal star and the compact 
object are in a close orbit around each other (Fig.[l]). Matter flows from the 
normal star to the compact object by way of a flat, spiraling flow called 
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an accretion disk, and finally accretes onto the compact object. A large 
amount of energy is released in this process (typically 10 36 to 10 38 erg/sec), 
and is emitted, mostly in the form of X-rays. The characteristic variability 
timescale for the X-ray emitting regions is predicted (and observed) to be 
very short (less than a millisecond). By studying the properties of this 
rapid X-ray variability it is possible to extract a great deal of information 
about the flow of matter onto the compact object, and, indirectly, about 
the object itself. See van der Klis (1995) for a recent review of the results 
of studies of this type. 

Secondary 




Star 



FIGURE 1. X-ray binary star. 



2 The data 

In order to understand the character of the date we are dealing with, I 
shall follow the flow of information from star to computer. An X-ray binary 
star emits X-ray photons at a very large rate (say, 10 46 photons/sec). For 
all practical purposes, the X-ray photon rate produced by the star can be 
considered as a continuous function of time I(t). These photons are emitted 
isotropically, or at least over a solid angle of order An sterad. Because the 
X-ray detector onboard the X-ray satellite spans only a very small solid 
angle as seen from the X-ray star, only a very small fraction e (say, 10~ 43 ) 
of the photons is detected in the instrument. The time series of photon 
arrival times U,i — 1, . . . , N p h tot (with N p h tot the total number of detected 
photons) is the information that, ideally, we would like to have available for 
analysis. However, typically, instrumental limitations (and the maximum 
telemetry rate) prevent the registration and transmission of all these arrival 
times. Therefore, onboard the satellite, the data are binned into equidistant 
time bins. The information that is finally telemetered to the ground station 
consists of a sequence of numbers x m ,m = 0, . . . , N to t — 1, where N to t is 
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the total number of time bins, x m representing the number of photons 
that was detected during time bin m. In most cases, and unlike the usual 
case in astronomy, the time bins are equidistant and contiguous (no gaps). 
The statistical problem facing us can be summarized as follows: "Given 
x m , m = 0, . . . , N to t — lj reconstruct as much as possible about I(t)" . 

We shall assume that for the bright X-ray binaries that I am discussing 
here, the rate of background photons can be considered to be negligible, 
and that there are no other relevant effects affecting the photon arrival time 
series than the huge geometrical dilution factor just described^] Therefore, 
if the X-ray star does not vary intrinsically, the ti will to a high degree 
of precision be uniformly and randomly distributed, so that the x k follow 
Poisson statistics appropriate to a rate \x = {x rn ), with a standard deviation 
o~ Xm equal to yfji. 

Of course these stars do vary. The variability time scales of interest are 
< 1 millisecond, and the detected photon rates 10 2 -10 5 photons/sec, which 
means that the time bins must be chosen such that typically there are 
on average only a few (sometimes <Cl) photons per time bin, and o~ Xm is 
of order /z (sometimes o~ Xm 3> fx). As the intrinsic variability of the star 
often has a relatively small amplitude, only a few percent of the total flux, 
it is clear that we are in a low signal-to-noise regime (with the "signal" 
the intrinsic stellar variability and the "noise" the Poisson fluctuations). 
Fortunately, typical observations span 10 3 to 10 5 sec, so that the number 
of time bins N to t (the number of "measurements") is 10 6 to 10 s , which 
allows us to recover the signal from the noise. The techniques used for this 
are described in the next section. 



3 Standard analysis 

The standard approach (see van der Klis 1989) is to divide the time series 
into M equal- length segments of N time bins Xk, k = 0, . . . , N— 1 each (so, 
ideally N to t — MN), to calculate the discrete Fourier transform of each 
segment: 

N-l 

aj =Y,x k e 2 ^ k / N j = 0,..., N/2, 

k=0 



1 This is of course not exactly true. The background does not pose large prob- 
lems in practice, as it just constitutes an additional source of detections (of pho- 
tons as well as charged particles) that is not strongly variable on the time scales 
we are interested in and uncorrelated to the fluctuations due to the star. Detector 
deadtime processes leading to "missed" photons do constitute a serious compli- 
cation (e.g., van der Klis 1989, Mitsuda and Dotani 1989, Vikhlinin et al. 1994, 
W. Zhang 1995) that will be ignored here. 
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to convert this into a power spectrum for each segment 
Pj = -\ aj \ 2 j = 0,...,N/2, 

do 

and then to average these power spectra (see below). Note that in our 
application a = J2k=o x k = N p h, the number of photons detected in 
the segment. With this power spectral normalization, due to Leahy et al. 
(1983), it is true that if the x/, are distributed according to the Poisson dis- 
tribution, then the Pj follow the \ 2 distribution with 2 degrees of freedom, 
so (Pj) = 2 and <jp j = 2. 

This white noise component with mean level 2 and standard deviation 2, 
induced in the power spectrum due to the Poisson fluctuations in the time 
series, is called the "Poisson level" . It can be considered as "background" in 
the power spectrum, against which we are trying to observe the other power 
spectral features, which are caused by the intrinsic variability of the X-ray 
binary. The physical dimension of the thus defined powers is the same as 
that of the time series: [Pj] — [aj] = [xk\ - Often the Y-axis of plots of power 
spectra in this normalization is just labeled "POWER" , which reflects the 
fact that the physical interpretation of the Pj in terms of properties of 
the star is inconvenient. For this reason, in recent years another power 
spectral normalization has become popular where the powers are reported 
as Qj = Pj/\, with A the "count rate", the number of detected photons 
per second: A = N ph /T, where T is the duration of a segment. The Qj 
are dimensionless, and can be interpreted as estimates of the power density 
Q{vj) near the frequency Vj = j/T, where Q(v) is a function of frequency 
whose integral gives the fractional root-mean-square amplitude r of the 
variability. This latter quantity is defined as 



N-l 



V -kEk=o( x k ~x) 2 _ 1 

r = — where x = — > Xk- 

x N ^— ' 

k=0 

(This follows directly from Parseval's theorem.) The fractional rms ampli- 
tude ri2 due to fluctuations in a given frequency range (^1,^2) is given 
by 

r^2 



ri2 = / Q{v)dv. 

J V-i 



So, the physical interpretation of Q{v) is easy: it is the function whose inte- 
gral gives you the square of the fractional rms amplitude of the variability in 
the original time series. The physical unit used for Q(v) is (rms/mean) 2 /Hz, 
where "rms" and "mean" both refer to the time series; "rms/mean" is just 
the dimensionless quantity r. 

The averaging of the power spectra mentioned above is usually performed 
both by averaging individual power spectra (from different segments) to- 
gether (averaging the Pj 's with the same j from the M different segments) 
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FIGURE 2. Top: An average of 6166 power spectra of EXOSAT ME data on the 
X-ray binary GX5— 1 showing power-law noise and QPO. Bottom: The standard 
deviation of the 6166 power values averaged in each frequency bin. Inset: the ratio 
of standard deviation to mean. The standard deviation equals the mean power 
as expected for \ 2 distributed powers. From Van der Klis (1989). 



and by averaging powers at adjacent frequencies (Pj+i to Pj+w, sav )- The 
main purpose of this is of course to decrease the standard deviation of the 
power estimates, which in the raw spectra is equal to the mean power. The 
reason to calculate many power spectra of segments of the data rather than 
one very large power spectrum of the whole data set, apart from compu- 
tational difficulties with this approach, is that in this way it is possible to 
study the variations in the power spectrum as a function of time. 

The final step in the analysis is to fit various functional shapes f{v) to 
the power spectra using the method of % 2 minimization that is also used in 
X-ray spectroscopy (the Levenberg-Marquardt method described in Press 
et al. 1992, Chapter 15). Because many individual power estimates have 
been averaged in the analysis process, the central limit theorem ensures 
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that the uncertainties of the final power estimates Q are approximately 
normally distributed, as required for this method to work well. A prob- 
lem is what "uncertainties" <Tq to assign to these power estimates. Usually 

°~q — Q/VMW is assumed, where MW is the number of individual powers 
averaged to obtain Q (W stands for the "width", the number of adjacent 
powers averaged; M for the number of averaged power spectra). This is 
approximately correct, and was experimentally verified (Fig.||), in the case 
that the dynamic range of the power spectrum is dominated by the intrin- 
sic differences in the mean amplitude of the star's variability as a function 
of frequency rather than by the stochastic fluctuations in power. If instead 
the stochastic variations dominate then this procedure for estimating the 
uncertainties can lead to severe underestimation of the power, as acciden- 
tally low powers will get high weights in the fitting procedure and vice 
versa. A solution to this problem that is sometimes adopted is to estimate 
the uncertainty in Q as / (V) j V MW , where f(p) is the fit function and z7 
the frequency corresponding to Q. 



4 Applications and results 

The method described in the previous section works. It allows astronomers 
to quickly characterize the variability properties of large amounts of data, 
to study the changes in the properties of the variability as a function of time 
and other source characteristics, and to measure amplitudes and character- 
istic time scales of the variability. The method straightforwardly extends 
to simultaneous multiple time series, for example time series obtained in 
different photon energy bands. For Nb anc i simultaneous time series, it is pos- 
sible to calculate N} >an d{N} )an d — 1) different cross-spectra between them, 
and to look for systematic time delays in the variability as a function of 
energy. A comprehensive description of the methods used for this can be 
found in the paper by Vaughan et al. (1994). 

A very important aspect is the possibility to identify different "power 
spectral components" in the variability. This is done by studying the changes 
in the shape of the power spectra as a function of time and other source 
properties, such as brightness or photon energy spectrum. It usually turns 
out that the simplest way to describe the changes in the power spectrum 
as a function of time is in terms of the sum of a number of components 
whose properties (strength, characteristic frequency) depend in a smooth, 
systematic and repeatible way on, for example, brightness. If this is the 
case, then it is natural to interpret these power spectral components as 
due to different physical processes (or different aspects of the same pro- 
cess) that are all affecting the count rates at the same time, and that have 
been disentangled from each other in the analysis just described. 

Examples of power spectral components that are distinguished in practice 
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BLACK-HOLE-CANDIDATE POWER SPECTRA 




FREQUENCY (Hz) 

FIGURE 3. Power spectra of black hole candidates. The spectrum labeled LS 
(low state) is of CygnusX-1; the other two are of GS 1124— 68. From Van der 
Klis (1995). 

are "power law noise" , "band-limited noise" and "quasi-periodic oscillations 
(QPO)" . All of these are all presumed to be stochastic processes in the time 
series that cause, respectively, a component in the power spectra that fits 
a function /pl(^) = CpLV~ a , one that fits /bljv(^) = CBLNV~ a e~ v l Vcut 
and one that fits a Lorentzian /qpo iy) = Cqpo / {v— z/ QPo) 2 + (r/2) 2 ). For 
a component to be called QPO it is sufficient that the corresponding power 
spectral component has a shape approximately described by /qpo( i/ ) with 
the peak full-width-at-half-maximum (FWHM) T less than half its centroid 
frequency vqpo] of course this definition is arbitrary. 

Figure[3] shows a number of actually observed power spectra, plus the 
functions that were fit to them in order to describe them in terms of the 
power spectral components just mentioned. QPO peaks, band-limited noise 
and power law noise can all be seen. 
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Sometimes the shape of the observed power spectrum is ambiguous with 
respect to the decomposition into power spectral components as described 
here. An advantage of the x 2 minimization method is that it allows to 
quantify the degree of this ambiguity by comparing the \ 2 values of the 
different possible combinations of fit parameters. 

5 Problems at the low-frequency end: 
low-frequency leakage 

At the low frequency end (typically, below 0.01 Hz) of the power spectra a 
number of problems occurs in an analysis along the lines described above 
that is usually not really satisfactorily dealt with. 

One problem is, that in order to reach the lowest frequencies in the first 
place, it is necessary to choose the length of the time segments T relatively 
large and therefore M relatively small. This means that at the lowest fre- 
quencies the statement made in Section ^, that a large number of individual 
powers has been averaged and therefore the average power is approximately 
normally distributed is no longer true. Other (e.g., maximum- likelihood) 
fitting procedures are required to take the true distribution of the average 
powers into account, but such methods are not usually applied. See Pa- 
padakis and Lawrence (1993) for a discussion of a method to remedy some 
of these problems. 

A more serious problem is, that the true power spectrum at these fre- 
quencies often seems to be a quite steep power law. The finite time window 
T in those situations leads to so-called "low-frequency leakage" (see Deeter 
1984 and references therein): power shows up at a higher frequency in the 
power spectrum than where it belongs. One way to see this is by noting that 
the lowest frequency accessible in the discrete Fourier transform is l/T. If 
there is a lot of variability at lower frequencies than this, then due to these 
slow variations the time series of an individual time segment will usually 
have a large overall trend. The Fourier transform of this trend produces 
a power law with index —2 in the power spectrum. Another way to de- 
scribe the effect of low-frequency leakage is by noting that according to the 
Fourier convolution theorem the Fourier transform calculated in the finite 
time window T is related to the true Fourier transform by a convolution 
of the true transform with the transform of the window function. As the 
window function is a boxcar, its Fourier transform is the well-known sine 
function, with a big central lobe and upper and lower sidelobes that grad- 
ually decrease in amplitude. For true power spectra steeper than a power 
law of index —2, the contribution to the convolution of the lower side lobes 
overwhelms that of the central lobe, and the result is a power spectrum 
that is a power law with index —2 (e.g., Bracewell 1965). So, for any true 
power spectrum steeper than —2, the actually measured power spectrum 
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will have slope of ^—2. 

There are well-known solutions to this well-known problem. The most 
famous one is data tapering: instead of a boxcar window a tapered win- 
dow is used, i.e., a window that makes the data go to zero near its end 
points more gradually than by a sudden drop to zero. Other methods are 
polynomial detrending (fitting a polynomial to the data and subtracting 
it) and end-matching (subtracting a linear function that passes through 
the first and the last data point). Deeter et al. (1982) and Deeter (1984) 
have explored a number of non- Fourier methods. All these methods work 
in the sense that to some extent they suppress the side lobes of the re- 
sponse function and therefore they are able to recover power laws steeper 
than with index —2 (the value of the power law index where the method 
breaks down is different in each case). However, typically these methods 
have only been evaluated for the case where the time series is pure power- 
law noise, and in many cases even only with respect to their effectiveness 
in recovering the power law index, not even the noise strength. Some meth- 
ods require that the index of the power law is known in advance! Nearly 
nothing is known about the way in which these methods affect the results 
of fits to power spectra with complicated shapes such as those described 
above. These methods may recover the correct power law index for the 
low-frequency part of the power spectrum, but what will be the effect on 
the fractional rms values and time scales of all the other components? For 
this reason, these methods are not usually applied. 



6 Problems at the high frequency end: what is the 
shortest time scale? 

The high-frequency end of the power spectra, near the Nyquist frequency, 
is of particular interest to astrophysicists, as it is there that we expect to 
find the signatures of the fastest accessible physical processes going on in 
the star. A question that is often asked in this context is: "what is the 
shortest variability time scale r we can detect in the data?". Generally, 
what is observed at high frequency is that the power spectrum more or 
less gradually slopes down towards the Poisson level. The problem is to 
decide out to which frequency intrinsic power is observed above the Poisson 
background, and what this number means. 

An, I think, decidedly misleading way to answer the question about the 
shortest time scale is by determining the shortest time interval At within 
which significant variations can be detected. It is obvious, that for a slowly 
varying source, by zooming in on some gradual slope in its light curve 
I(t), this shortest time interval can be made as short as one wishes, just 
by improving the quality of the data (by increasing e for example). Yet, 
most of the work on "shortest time scales" seems to aim for measuring 
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At rather than r. It seems clear that one must also take into account the 
amplitude of the variations, not just the time within which they occur to 
do something that is physically useful. Following Fabian (1987) one can for 
example define the variability time scale as 

= m, 

I(t) 

where the dot denotes the time derivative. Defined this way, t is a measure 
of how steeply I changes with t, and depends itself on t. Using a power 
spectrum, one would measure some average of this quantity by determining 
the fractional rms amplitude r{vhi) near some high frequency Vh% in the way 
described in Section^ and then write 

r = C x ' 



Vh%r(y hi ) ' 



where C is a constant of order unity depending on assumptions on exactly 
how the variations causing the power in the spectrum took place. How- 
ever, for precise work one has to worry about low-frequency leakage, too. 
Exactly the same problem as described in Section [| can occur here: power 
from lower frequencies can leak up to higher frequencies due to the sine 
response associated with the power estimators. That this problem is se- 
rious is apparent from the fact that the observed power spectrum of for 
example the famous black-hole candidate CygX-1 has an index of ^—2 for 
frequencies above ~10 Hz (see Fig.^), just the value at which low-frequency 
leakage begins to worry us. It would be of great interest to have a foolproof 
way to subtract power that has leaked up from lower frequencies, or even 
to have a way to make a conservative estimate of (obtain a lower limit 
on) the true high-frequency power. It is not clear to what extent this can 
be accomplished. Obviously, as in any convolution problem, some infor- 
mation has been lost, but how much can be recovered is a problem X-ray 
astronomers do not know the answer to. I wonder to what extent standard 
deconvolution procedures might be useful here. 

I note parenthically that a method that has been applied for determining 
the shortest variability in a time series by Meekins et al. (1984) seems to 
suffer from the same problem of low-frequency leakage. In this method, the 
time series is divided up in very short segments of, for example, N = 10 
bins each. In each segment a quantity called "chi-squared" is calculated as 
follows: 



N-l 



2 _ (a* - Nph/N)" 



A _„ N ph /N 

Here Xk is the number of photons detected in bin k and N p h is the to- 
tal number of photons in the segment. One recognizes an "observed over 
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expected" variance ratio for an expected Poisson distribution. The distri- 
bution of this quantity is then compared to that expected if all variability 
in the time series would be due to Poisson fluctuations and if a significant 
excess is found, this is interpreted as detection of variability on time scales 
between the length T of a segment and the duration of one bin. One would 
expect low-frequency leakage to be as much of a problem here as in the 
power spectral method. It is easy to see that if there are variations in the 
time series on time scales much longer than the segment length, the data 
points in the segments will usually follow steep trends, causing large val- 
ues of XMeekins that are not related to variability on time scales shorter 
than T. Indeed, Mcckins ct al. show that XMeekins ^ s closely related to the 
Fourier power in the segment, so from the point of view of low-frequency 
leakage the Meekins et al. method is less effective than the standard power 
spectral approach, as it requires T to be quite short and therefore increases 
the probability of steep trends in the data segments. 
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